Statistical Modelling of Speech Segment Duration by Constrained Tree Regression

نویسندگان

Naoto IWAHASHI

Yoshinori SAGISAKA

چکیده

This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor variable spaces and partial imposition of constraints of linear independence among predictor variables. It incorporates both linear and tree regressions with categorical predictor variables, which have been conventionally used for prosody control, and extends them to more general models. In addition, a hierarchical error function is presented to consider hierarchical structure in prosody control. This new method is applied to modelling of speech segmental duration. Experimental results show that better duration models are obtained by using the proposed regression method compared with linear and tree regressions using the same number of free parameters. It is also shown that the hierarchical structure of phoneme and syllable durations can be represented efficiently using the hierarchical error function. key words: speech segmental duration, statistical modelling, regression

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Overview of Prosodic Modelling for Croatian Speech Synthesis

In order to include prosody into the text to speech (TTS) systems prosody knowledge needs to be acquired, represented and incorporated. Two main features of prosody important for modelling prosody for TTS systems are duration and F0 contour. There are various approaches to modelling those features and they can be categorized into three main groups: rule based, statistical and minimalistic. Some...

متن کامل

Bayesian Modelling Of Vowel Segment Duration For Text-to-Speech Synthesis Using Distinctive Features

We apply a Bayesian belief network (BN) approach to vowel duration modelling, whereby vowel segment duration is modelled as a hybrid Bayesian network consisting of discrete and continuous nodes, with the nodes in the network representing linguistic factors that affect segment duration. Factor interaction is modelled in a concise way by causal relationships among the nodes in a directed acyclic ...

متن کامل

MIMIC : a voice-adaptive phonetic-tree speech synthesiser

This paper presents Mimic : a decision-tree based concatenative voice adaptive text to speech synthesiser. Mimic integrates text to speech synthesis (TTS) with speech recognition and speaker adaptation. Speech is synthesised from concatenation of triphone synthesis units. The triphone units are obtained from clusters of training examples modelled, labelled and segmented using clustered HMMs and...

متن کامل

Learning duration

In this paper, we investigate the possibilities to enhance statistic modelling of segment duration for speech synthesis. In particular we look at the effects of gradually increasing size of training data and at specific problems of phonetic coding. We show that questions arising due to the inherent mismatch between cannon phonemic representation and phonetic realisation are best answered by sta...

متن کامل

Using bayesian belief networks for model duration in text-to-speech systems

The problems of database imbalance and factor interaction make modelling of segment duration in text-to-speech systems a challenging task. We therefore propose a probabilistic Bayesian belief network (BN) approach to tackle data sparsity and factor interaction problems. The belief network approach makes good estimations in cases of missed or incomplete data. Also, it captures factor interaction...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Statistical Modelling of Speech Segment Duration by Constrained Tree Regression

نویسندگان

چکیده

منابع مشابه

An Overview of Prosodic Modelling for Croatian Speech Synthesis

Bayesian Modelling Of Vowel Segment Duration For Text-to-Speech Synthesis Using Distinctive Features

MIMIC : a voice-adaptive phonetic-tree speech synthesiser

Learning duration

Using bayesian belief networks for model duration in text-to-speech systems

عنوان ژورنال:

اشتراک گذاری